1,051 research outputs found

    Finding the "truncated" polynomial that is closest to a function

    Get PDF
    When implementing regular enough functions (e.g., elementary or special functions) on a computing system, we frequently use polynomial approximations. In most cases, the polynomial that best approximates (for a given distance and in a given interval) a function has coefficients that are not exactly representable with a finite number of bits. And yet, the polynomial approximations that are actually implemented do have coefficients that are represented with a finite - and sometimes small - number of bits: this is due to the finiteness of the floating-point representations (for software implementations), and to the need to have small, hence fast and/or inexpensive, multipliers (for hardware implementations). We then have to consider polynomial approximations for which the degree-ii coefficient has at most mim_i fractional bits (in other words, it is a rational number with denominator 2mi2^{m_i}). We provide a general method for finding the best polynomial approximation under this constraint. Then, we suggest refinements than can be used to accelerate our method.Comment: 14 pages, 1 figur

    Computing Integer Powers in Floating-Point Arithmetic

    Get PDF
    We introduce two algorithms for accurately evaluating powers to a positive integer in floating-point arithmetic, assuming a fused multiply-add (fma) instruction is available. We show that our log-time algorithm always produce faithfully-rounded results, discuss the possibility of getting correctly rounded results, and show that results correctly rounded in double precision can be obtained if extended-precision is available with the possibility to round into double precision (with a single rounding).Comment: Laboratoire LIP : CNRS/ENS Lyon/INRIA/Universit\'e Lyon

    On the error of Computing ab + cd using Cornea, Harrison and Tang's method

    Get PDF
    International audienceIn their book, Scientific Computing on the Itanium, Cornea et al. [2002] introduce an accurate algorithm for evaluating expressions of the form ab + cd in binary floating-point arithmetic, assuming an FMA instruction is available. They show that if p is the precision of the floating-point format and if u = 2^{−p}, the relative error of the result is of order u. We improve their proof to show that the relative error is bounded by 2u + 7u^2 + 6u^3. Furthermore, by building an example for which the relative error is asymptotically (as p → ∞ or, equivalently, as u → 0) equivalent to 2u, we show that our error bound is asymptotically optimal

    Generating function approximations at compile time

    Get PDF
    ISBN : 12-4244-0785-0 ISSN: 1058-6393International audienceUsually, the mathematical functions used in a numerical programs are decomposed into elementary functions (such as sine, cosine, exponential, logarithm...), and for each of these functions, we use a program from a library. This may have some drawbacks: first in frequent cases, it is a compound function (e.g. log(1 + exp(−x))) that is needed, so that directly building a polynomial or rational approximation for that function (instead of decomposing it) would result in a faster and/or more accurate calculation. Also, at compile-time, we might have some information (e.g., on the range of the input value) that could help to simplify the program. We investigate the possibility of directly building accurate approximations at compile-time

    Vers des primitives propres en arithmétique des ordinateurs

    Get PDF
    La norme IEEE-754 consacrée à l'arithmétique virgule flottante spécifie le comportement des quatre opérations arithmétiques. Une spécification des fonctions élémentaires devrait voir le jour dans les années à venir. On s'intéresse dans cet article aux avantages que l'on peut tirer d'un systÚme dont les «primitives numériques» sont complÚtement spécifiées

    Avoiding double roundings in scaled Newton-Raphson division

    Get PDF
    Abstract-When performing divisions using Newton-Raphson (or similar) iterations on a processor with a floating-point fused multiply-add instruction, one must sometimes scale the iterations, to avoid over/underflow and/or loss of accuracy. This may lead to double-roundings, resulting in output values that may not be correctly rounded when the quotient falls in the subnormal range. We show how to avoid this problem

    Solving Systems of Linear Equations in Complex Domain : Complex E-Method

    Get PDF
    The E-method, introduced by Ercegovac, allows efficient parallel solution of diagonally dominant systems of linear equations in real domain using simple and highly regular hardware. Since the evaluation of polynomials and certain rational functions can be achieved by solving the corresponding linear systems, the E-method is an attractive general approach for function evaluation. We generalize the E-method to complex linear systems, and show some potential applications such as the evaluation of complex polynomials and rational functions
    • 

    corecore